How to suppress the shot noise in galaxy surveys 
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Galaxy surveys are one of the most powerful means to extract the cosmological information and 
for a given volume the attainable precision is determined by the galaxy shot noise a" 2 , relative to the 
power spectrum P. It is generally assumed that shot noise is white and given by the inverse of the 
number density n. In this paper we argue one may be able to considerably improve upon this: in the 
halo picture of cosmological structure all of the dark matter is in halos of varying mass and galaxies 
are formed inside these halos, but for the dark matter mass and momentum conservation guarantee 
that nonlinear effects cannot develop a white noise in the dark matter power spectrum on large 
scales. This suggests that with a suitable weighting a similar effect may be achieved for galaxies, 
suppressing their shot noise. We explore this idea with N-body simulations by weighting central halo 
galaxies by halo mass and find that the resulting shot noise can be reduced dramatically relative 
to expectations, with a 10-30 suppression at the highest number density of n = 4 x 10 _3 (Mpc/h) 3 
resolved in our simulations. For specific applications other weighting schemes may achieve even 
better results and for ft — 3 x 10 _4 (Mpc/h) 3 we can reduce a^/P by up to a factor of 10 relative to 
uniform weighting. These results open up new opportunities to extract cosmological information in 
galaxy surveys, such as the recently proposed multi-tracer approach to cancel sampling variance, and 
may have important consequences for the planning of future redshift surveys. Taking full advantage 
of these findings may require better understanding of galaxy formation process to develop accurate 
tracers of the halo mass. 

PACS numbers: 98.80 
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Galaxy clustering has been one of the leading meth- 
ods to measure the clustering of dark matter in the past 
and with upcoming redshift surveys such as SDSS-III and 
JDEM/EUCLID this will continue to be the case in the 
future. Galaxies are easily observed and by measuring 
their redshift one can determine their three-dimensional 
distribution. This is currently the only large scale struc- 
ture method that provides 3-dimcnsional information. 
On large scales galaxies trace the dark matter up to a 
constant of proportionality called bias b, so the galaxy 
power spectrum can be directly related to the dark mat- 
ter power spectrum shape, which contains a wealth of 
information such as the scale dependence of primordial 
fluctuations, signatures of massive neutrinos and mat- 
ter density etc. In recent years the baryonic acoustic 
oscillations (BAO) feature in the power spectrum has 
been emphasized, which can be used as a standard ruler 
and in combination with cosmic microwave background 
anisotropies can provide a redshift distance test |l| . 

For the power spectrum measurement there are two 
sources of error: one is the sampling (sometimes called 
cosmic) variance, the fact that each mode is a gaussian 
random realization and all the cosmological information 
lies in its variance, which cannot be well determined on 
the largest scales because the number of modes is finite. 
Second source of noise is the shot noise due to the dis- 
crete sampling of galaxies, <7 2 , which under the standard 
assumptions of Poisson sampling equals the inverse of the 
number density n. The total error on the power spectrum 
P is ap/P = (2/jV) 1/2 (l+0-£/P), where N is the number 



of modes measured and scales linearly with the volume of 
the survey. While the above expression suggests there is 
not much benefit in reducing the shot noise to cr^/P <C 1 
since sampling variance error remains, recent work sug- 
gests there are potential gains in that limit, since we may 
be able to reduce the damping of the BAO better Q. 

Recently a new multi-tracer method has been devel- 
oped where by comparing two differently biased tracers of 
the same structure one can extract cosmological informa- 
tion in a way that the sampling variance error cancels out 
(sj. There are several applications of this method, such 
as measuring the primordial non-gaussianity [|[ , redshift 
space distortion parameter (3 or relation between the 
Hubble parameter and the angular diameter distance [1] . 
In all these applications one can achieve significant gains 
in the error of the extracted cosmological parameters if 
cr^/P <C 1. Thus in all of these applications the galaxy 
shot noise relative to the power spectrum is the key quan- 
tity that controls the ultimate level of cosmological pre- 
cision one can achieve with galaxy surveys. 

The relation between the galaxy and the dark mat- 
ter clustering can be understood with the halo model 
[E B 01 1 where all of the dark matter is divided into 
collapsed halos of varying mass. There are two contribu- 
tions to the dark matter clustering: first is the correlation 
between two separate halos, which is assumed to be pro- 
portional to the linear theory spectrum times the product 
of the two halo biases, while the second contribution is 
the one halo term which includes the clustering contri- 
butions from the individual halo itself. One obtains the 
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dark matter power spectrum prediction by adding up the 
contributions from all the halos. Since galaxies are as- 
sumed to form inside the halos one can write analogous 
expressions for galaxy clustering power spectrum once 
one specifies the occupation distribution of galaxies as a 
function of halo mass. 

One consequence of the halo model is that the one halo 
term is dominated by the most massive halos and reduces 
to white noise k° for very small wavemode amplitude 
k <C R , where R is the size of the largest halos. For 
galaxies this is believed to be a valid description of the 
shot noise amplitude in the low k limit. It distinguishes 
between the galaxy and the halo number density, but for 
a typical survey the fraction of halos with more than one 
galaxy in it is small, 5-30% @, and here we will ignore 
this distinction and assume for simplicity there is only 
one galaxy in each halo at its center. 

For the dark matter, the nonlinear evolution of struc- 
ture requires local mass and momentum conservation and 
as a result the low k limit of nonlinear contribution is pre- 
dicted to scale as k 4 and not k° Q. This is indeed seen 
in simulations [loj . making this prediction of the halo 
model invalid. While this is often seen as a deficiency of 
the halo model, here we take it as an opportunity: if the 
dark matter has no white noise tail in the k — > limit 
then in the context of the halo model where all the dark 
matter is in the halos and the halo size becomes irrele- 
vant in k <C -R -1 limit it should be possible to achieve 
the same effect with galaxies, if one can enforce the lo- 
cal mass and momentum conservation. The most natural 
possibility is to weight the galaxies by the halo mass. 

The purpose of this letter is to explore this idea with 
numerical simulations. We employ a suite of large N- 
body simulations using Gadget II code, which include 
four 1024 3 particles in a (1.6h _1 Gpc) 3 box and one sim- 
ulation with 1536 3 particles in a (1.3h Gpc) box. The 
fiducial cosmological model has a scale invariant spec- 
trum with amplitude = 0.81, matter density fl m = 
0.28 and Hubble parameter H = 70km/s/Mpc. We 
ran Friends of Friends halo finder and kept all the halos 
with more than 20 particles, with the lowest halo mass 
of 6 x 10 12 /i _1 M Q and 10 12 /i _:l Mq, respectively. 

If a tracer has an overdensity Sh with a bias bh, then 
the relation to the dark matter overdensity 5 m in Fourier 
space can be written as 8h = bhS m + n, where n is shot 
noise with a power spectrum (w 2 ) = cr 2 and we assume 
it is uncorrelated with the signal, ie (8 m n) — (the op- 
erations should be taken separately on real and imagi- 
nary components of the Fourier modes). Thus we define 
= (i s h - b h 5 m ) 2 ) and bias is b h = (Phh/P mm ) 1/2 = 
Phm/Pmm, where P hh = (<S, 2 ) - cr 2 , P hm = {5 m 5 h ) 
and P mm = (fif n ). This is equivalent to choosing cr 2 
such that the cross correlation coefficient is unity, r = 
Phm/ (PhhPmm) 1 / 2 = 1. Thus our definition of the shot 
noise includes all sources of stochasticity between the 
halos and the dark matter, so it is the most conserva- 
tive. This can be done as a function of k and so al- 
lows for a possibility that noise is not white. We do 



not assume a constant bias, although we find that for 
k <C O.lh/Mpc this is generally true. Another way to 
define the shot noise is through the power spectrum flue- 
tuations, ((S 2 - P hh - a 2 ) 2 ) = (2/N)(P 2 h + (a 2 ) 2 ). We 
find this definition in general has larger variance, but is 
on average in agreement with the definition above, which 
we will use in the following. 

We begin by first investigating the shot noise when 
each halo has equal weight. The simplest case is that 
of a bin in halo mass, for which we remove the top 
10% of the most massive halos in a simulation and take 
the remaining ones to match a given abundance. As 
shown in figure [T] the prediction cr 2 = n^ 1 is satisified 
for n — 10~ 4 (h/Mpc) 3 , but is lower than simulations at 
higher abundances, by a factor of 3 for our highest num- 
ber density of h = 4 x 10~ 3 (h/Mpc) 3 . Second possibility 
is that of a mass threshold, where all of the halos above 
certain minimum cutoff are populated. This gives a bit 
lower shot noise than mass bins for the same abundance. 
Overall, we find that weighting the galaxies uniformly 
leads to a shot noise power that can be larger than the 
usually assumed H , so the standard error analysis may 
be overly optimistic and shot noise should be a free pa- 
rameter determined from the data itself. 

Next we investigate the shot noise for non-uniform 
halo dependent weighting Wi for the same mass thresh- 
old sample. We compare the simulations to the expecta- 
tion a 2 = V J2i w i/(J2i w i) 2 i where V is the volume and 
the sum is over all the halos. At a given number den- 
sity this expression is minimized for uniform weighting 
(where it equals n^ 1 ), so non-uniform weighting gener- 
ally increases the expected shot noise. As argued above 
Wi = Mi, where Mj is the halo mass, is the natural im- 
plementation of the idea to enforce mass and momentum 
conservation for the halos. The results are shown in fig- 
ure!]] We see that the predicted and measured shot noise 
amplitudes differ significantly and the difference reaches 
a factor of 10-30 at the highest abundance in our simu- 
lations, fi = 4 X 10~ 3 (h/Mpc) 3 . This demonstrates that 
this is not a simple Poisson sampling of the field and that 
mass and momentum conservation work to suppress the 
shot noise relative to expectations. 

Other weightings may also improve the results rela- 
tive to naive expectations and may work even better 
for specific applications. For example, weighting by 
f(M) = M/(l + {M/lQ u h- 1 M & ) Q - 5 ), shown in figure 
[TJ improves upon the mass weighting. This weighting 
equals the halo mass weighting over the mass range of 
M < 10 14 h~ 1 MQ, while giving a lower weight to the 
higher mass halos relative to the mass weighting. Weight- 
ing by the halo mass gives a very large weight to the most 
massive halos and this non-uniform weighting leads to a 
significant increase in the naive shot noise prediction a 2 
relative to the number density of halos. Therefore, if 
the conservation of mass and momentum is not perfect 
for the most massive halos the residual shot noise may 
still be large, which may explain why downweighting high 
mass halos may work better. On the other hand, simply 
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FIG. 1: Shot noise power spectrum cr 2 measured in sim- 
ulations for uniform weighting of halos in a mass bin and 
mass threshold, mass weighting and f(M) = M/(l + 
(M/10 14 ft _1 M©) ' 5 weighting, for several different abun- 
dances, corresponding at z = to mass thresholds of 
4 x 10 13 /i _1 M Q , 1.4 x 10 13 /i _1 Af Q //i, 6 x 1O 12 /i _1 M and 
1O 12 /i _1 M0//i, from the lowest to the highest abundance, re- 
spectively. Straight lines (same color/line style) are the ex- 
pected shot noise cr 2 for each of the weightings (equal for the 
mass bin and mass threshold with uniform weighting). 



eliminating the halos above 10 14 /i _1 Af Q while preserving 
mass weighting below that mass completely erased any 
advantages. We also tried weighting by the halo bias b, 
which was argued to minimize a 2 /P [12|, and found no 
improvements relative to uniform weighting, as expected 
since it is close to uniform weighting for most of the halos 
and therefore does not implement the mass and momen- 
tum conservation efficiently. It is possible that one may 
be able to further improve the signal to noise by optimiz- 
ing the weights, but the optimization will depend on the 
specific application one has in mind (e.g. non-gaussianity, 
redshift space distortions, BAO etc.) and is beyond the 
scope of this paper. 

For actual applications we want to minimize a 2 , /P. 
Figure [5] shows the results for the same cases as in figure 
[TJ We see there are significant improvements in u^/P rel- 
ative to the uniform weighting and that mass and mod- 
ified mass give comparable results, with improvements 
in excess of 10 possible relative to the uniform weight- 
ing. While these results are all at z — where we have 
the highest density of halos, we also computed them at 
higher redshifts. At z = 0.5 and n = 3 X 10~ 4 (h/Mpc) 3 , 
target density for SDSS-III, we find a factor of 3-10 im- 
provement at BAO scale in mass weighting relative to 
the uniform, comparable to z = case at the same num- 
ber density. This means that the achievable error on 
cosmological parameters from BAO can be improved sig- 
nificantly for the same number of objects measured. Al- 
ternatively, a significantly lower number of objects may 
be needed to achieve the same precision and one can re- 
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FIG. 2: Same as figure [TJ but for a 2 /P. Also shown are the 
bias values for the different cases. 
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FIG. 3: Effects of log-normal scatter a in halo mass observ- 
able on the shot noise a 2 for mass and f(M) = M/(l + 
(M/lO 14 fe- 1 M ) - 5 ) weights, for n = 3 x l(T 4 (h/Mpc) 3 and 
n = 4 x 10~ 3 (h/Mpc) 3 . Scatter hardly affects the bias, so the 
relative effects of scatter are the same for a„/P and we do 
not show them here. 

duce the target number density by nearly a factor of 3. 
Note that SDSS-III plan is to oversample the galaxies at 
the BAO scale to use reconstruction to reduce the damp- 
ing of BAO, which can be done better if the shot noise is 
lower. It is also possible that imposing the local mass and 
momentum conservation will minimize systematic shifts 
in the BAO position relative to the dark matter that may 
otherwise be problematic (l3T |. but we leave this investi- 
gation for the future. 

So far we ignored the real world complications such as 
the imprecise knowledge of the halo mass. To investigate 
this we add a log-normal scatter with rms variance a to 
each halo mass and recompute the analysis. Fig [3] shows 
the results for mass and modified mass f(M) weighting: 
for the latter we see that scatter of 50% in mass increases 
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<y 2 n /P by about 50% for lower abundance and a factor of 
2 for higher abundance. Since this is a realistic scatter 
for optically selected clusters [lH there is thus realistic 
possibility that we can apply such analysis to the real 
data and achieve these gains. In practical applications 
one would try to identify the best halo mass tracer as 
a function of halo mass, for example central galaxy lu- 
minosity in the galactic halos and richness or total lu- 
minosity for the cluster halos. In order to minimize the 
scatter one must understand the relation between the 
galaxy observables and the underlying halos, so progress 
in galaxy formation studies will be needed to maximize 
the gains. We find that for the mass weighting scatter 
has a larger effect, such that for a — 0.5 the degradation 
in a^/P is a factor of 2-3. Once the scatter becomes too 
large there is no longer any local mass and momentum 
conservation and we find that for a = 1 the shot noise 
is worse than for uniform weighting. Another potential 
complication is the effect of redshift space distortions, 
since the observed radial distance is a sum of the true ra- 
dial distance and peculiar velocity (divided by the Hubble 
parameter). We find a modest (50%) increase in cr^/P, 
where P in redshift space is the spherically averaged (i.e. 
monopole) power spectrum. Since redshift space con- 
tains much more information than just the monopole it 
is possible that one may be able to use the additional 
information to reduce this degradation and we leave this 
for a future investigation. 

These results are particularly relevant for the multi- 
tracer methods where the data are analyzed in terms of 
ratios of different tracers and for which the sampling vari- 
ance error cancels, such as those recently proposed for 
non-gaussianity @, redshift space distortions and Hub- 
ble versus angular distance relation For these there 
is no lower limit on the achievable error decreases as long 
as cr^/P decreases and the method proposed here could 



lead to a significant reduction of errors relative to pre- 
vious expectations. We see from figure [2] that for mass 
weighting at 4 x 10~ 3 (h/Mpc) 3 afjP ~ 10~ 3 on large 
scales, so this could give a signal to noise of 30 for a sin- 
gle mode, compared to 0.7 for the single tracer method, 
equivalent to 3 orders of magnitude reduction in volume 
needed to reach the same precision. Note that this is 
not unreachable, since the existing SDSS survey achieves 
n ~ 10~ 2 (h/Mpc) 3 for the redshift survey of the main 
sample. 

Equally impressive improvements may be possible for 
future redshift surveys such as JDEM/EUCLID or Big- 
BOSS, which are expected to operate at redshifts up to 
z ~ 2. Their target number density could be as high 
as n ~ 10~ 3 (h/Mpc) 3 or higher, and the method pro- 
posed here could lead to a dramatic reduction of errors or, 
equivalently, to a several-fold reduction in the number of 
measured redshifts required to reach the target precision, 
with potentially important implications for the design of 
these missions. The weights can be further optimized for 
specific applications, specially for the multi-tracer meth- 
ods that cancel out the sampling variance error. This 
approach holds the promise to become the most accurate 
method to extract both the primordial non-gaussianity 
and the dark energy equation of state and its full promise 
should be explored further with more realistic simula- 
tions. In parallel we should develop better our under- 
standing of galaxy formation to relate the galaxy observ- 
ables to the underlying halo mass with as little scatter as 
possible. 

We thank V. Springel for making the Gadget-2 code 
available to us and P. McDonald, D. Eisenstein and R. 
Smith for useful comments. This work is supported by 
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